Random Forest Visualization
نویسنده
چکیده
Classification is the process of assigning a class label to an observation based on its proprieties or attributes. A classification algorithm is applied to a data set, producing a model. By studying the model, insights about the data set structure can be gained. The benefits that a model can bring depend on the model. In this work, a Random Forest model is used for the analysis of data. A Random Forest model is explored by means of visualization. The results include this report and the prototype of a visualization analysis tool. The tool, named ReFINE for Random Forest INspEctor, consists of several visualizations for a Random Forest model. ReFINE provides visualizations for Random Forest components trees, and its special feature: proximity measure, variable importance, interactions and prototypes. Each of these aspects is presented with a different visualization technique; all the visualizations are integrated together to show the connections between them and allow a user to discover patterns in data sets. The effectiveness of the approach is validated with various data sets, including generated and real data. As a result, ReFINE allows to investigate data, its most importance variables, theirs split points, connection between instances and their distribution.
منابع مشابه
Predicting Confusion in Information Visualization from Eye Tracking and Interaction Data
Confusion has been found to hinder user experience with visualizations. If confusion could be predicted and resolved in real time, user experience and satisfaction would greatly improve. In this paper, we focus on predicting occurrences of confusion during the interaction with a visualization using eye tracking and mouse data. The data was collected during a user study with ValueChart, an inter...
متن کاملVisualizing A Walk Through the Random Forest
Well-designed visualizations have an important role to play to aid in the public’s understanding of algorithms. This work presents a set of design principles for using visualization to explain machine learning algorithms specifically, and demonstrates these principles applied to the operations of the random forest algorithm.
متن کاملVisualizing Random Forest’s Prediction Results
The current paper proposes a new visualization tool to help check the quality of the random forest predictions by plotting the proximity matrix as weighted networks. This new visualization technique will be compared with the traditional multidimensional scale plot. The present paper also introduces a new accuracy index (proportion of misplaced cases), and compares it to total accuracy, sensitiv...
متن کاملRandom Forest Ensemble Visualization
The Random forest model for machine learning has become a very popular data mining algorithm due to its high predictive accuracy as well as simiplicity in execution. The downside is that the model is difficult to interpret. The model consists of a collection of classification trees. Our proposed visualization aggregates the collection of trees based on the number of feature appearances at node ...
متن کاملCyber Security Network Anomaly Detection and Visualization
In this Major Qualifying Project, we present a novel anomaly detection system for computer networks and a visualization system to help users explore network captures. The detection algorithm uses Robust Principal Component Analysis to produce a lower dimensional subspace of the original data for which a sparse matrix of outliers occurs. This low dimensional data subspace is determined by a nove...
متن کامل